Marginal Asymptotics for the “large P, Small N” Paradigm: with Applications to Microarray Data

نویسندگان

  • Michael R. Kosorok
  • Shuangge Ma
چکیده

The “large p, small n” paradigm arises in microarray studies, where expression levels of thousands of genes are monitored for a small number of subjects. There has been an increasing demand for study of asymptotics for the various statistical models and methodologies using genomic data. In this article, we focus on one-sample and two-sample microarray experiments, where the goal is to identify significantly differentially expressed genes. We establish uniform consistency of certain estimators of marginal distribution functions, sample means and sample medians under the large p small n assumption. We also establish uniform consistency of marginal p-values based on certain asymptotic approximations which permit inference based on false discovery rate techniques. The affects of the normalization process on these results is also investigated. Simulation studies and data analyses are used to assess finite sample performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest

Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...

متن کامل

Local likelihood regression in generalized linear single-index models with applications to microarray data

Searching for an effective dimension reduction space is an important problem in regression, especially for high dimensional data such as microarray data. A major characteristic of microarray data consists in the small number of observations n and a very large number of genes p. This “large p, small n” paradigm makes the discriminant analysis for classification difficult. In order to offset this...

متن کامل

A new test for sphericity of the covariance matrix for high dimensional data

AMS subject classifications: 62H10 62H15 Keywords: Covariance matrix Hypothesis testing High-dimensional data analysis a b s t r a c t In this paper we propose a new test procedure for sphericity of the covariance matrix when the dimensionality, p, exceeds that of the sample size, N = n + 1. Under the assumptions that (A) 0 < trΣ the concentration, a new statistic is developed utilizing the rat...

متن کامل

The False Discovery Rate in Simultaneous Fisher and Adjusted Permutation Hypothesis Testing on Microarray Data

Background and Objectives: In recent years, new technologies have led to produce a large amount of data and in the field of biology, microarray technology has also dramatically developed. Meanwhile, the Fisher test is used to compare the control group with two or more experimental groups and also to detect the differentially expressed genes. In this study, the false discovery rate was investiga...

متن کامل

Investigation on metabolism of cisplatin resistant ovarian cancer using a genome scale metabolic model and microarray data

Objective(s): Many cancer cells show significant resistance to drugs that kill drug sensitive cancer cells and non-tumor cells and such resistance might be a consequence of the difference in metabolism. Therefore, studying the metabolism of drug resistant cancer cells and comparison with drug sensitive and normal cell lines is the objective of this research. Material and Methods:Metabolism of c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005